Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (triplets) that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation (stop codons).
There are 64 different codons (61 codons encoding for amino acids plus 3 stop codons) but only 20 different translated amino acids. The overabundance in the number of codons allows many amino acids to be encoded by more than one codon. Because of such redundancy it is said that the genetic code is degenerate. Different organisms often show particular preferences for one of the several codons that encode the same amino acid- that is, a greater frequency of one will be found than expected by chance. How such preferences arise is a much debated area of molecular evolution.
It is generally acknowledged that codon preferences reflect a balance between mutational biases and natural selection for translational optimization. Optimal codons in fast-growing microorganisms, like Escherichia coli or Saccharomyces cerevisiae (baker's yeast), reflect the composition of their respective genomic tRNA pool. It is thought that optimal codons help to achieve faster translation rates and high accuracy. As a result of these factors, translational selection is expected to be stronger in highly expressed genes, as is indeed the case for the above-mentioned organisms. In other organisms that do not show high growing rates or that present small genomes, codon usage optimization is normally absent, and codon preferences are determined by the characteristic mutational biases seen in that particular genome. Examples of this are Homo sapiens (human) and Helicobacter pylori. Organisms that show an intermediate level of codon usage optimization include Drosophila melanogaster (fruit fly), Caenorhabditis elegans (nematode worm) or Arabidopsis thaliana (thale cress).
The nature of the codon usage-tRNA optimization has been fiercely debated. It is not clear whether codon usage drives tRNA evolution or vice versa. At least one mathematical model has been developed where both codon-usage and tRNA-expression co-evolve in feedback fashion (i.e., codons already present in high frequencies drive up the expression of their corresponding tRNAs, and tRNAs normally expressed at high levels drive up the frequency of their corresponding codons), however this model does not seem to yet have experimental confirmation. Another problem is that the evolution of tRNA genes has been a very inactive area of research.
Contents |
Different factors have been proposed to be related to codon usage bias, including gene expression level (reflecting selection for optimizing translation process by tRNA abundance), %G+C composition (reflecting horizontal gene transfer or mutational bias), GC skew (reflecting strand-specific mutational bias), amino acid conservation, protein hydropathy, transcriptional selection, RNA stability, optimal growth temperature and hypersaline adaptation.[1][2][3]
In the field of bioinformatics and computational biology, many statistical methods have been proposed and used to analyze codon usage bias.[4] Methods such as the 'frequency of optimal codons' (Fop) [5], the Relative Codon Adaptation (RCA) [6] or the 'Codon Adaptation Index' (CAI) [7] are used to predict gene expression levels, while methods such as the 'effective number of codons' (Nc) and Shannon entropy from information theory are used to measure codon usage evenness.[8] Multivariate statistical methods, such as correspondence analysis and principal component analysis, are widely used to analyze variations in codon usage among genes.[9] There are many computer programs to implement the statistical analyses enumerated above, including CodonW, GCUA, INCA, etc. Codon optimization has applications in designing synthetic genes and DNA vaccines. Several software packages are available online for this purpose (refer to external links). Optimizing the occurrence of desired/undesired motifs and sequence composition in all possible reveres translated gene sequences increases the search space exponentially w.r.t. gene length. For those reasons, the problem could be addressed using optimization algorithms like genetic algorithms (Sandhu et al., In Silico Biol. 2008;8(2):187-92).